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ABSTRACT 


Australia is a large continent with a relatively small population, and government agencies and research institutions 
are devoting considerable resources to the development of new approaches and tools for conserving and managing 
Australia’s biodiversity. Issues of data quality, choice of analysis method, ecological theory, and GIS (geographic infor- 
mation system) use are discussed using examples from recent Australian studies with emphasis on the scientific 
components, The problem of data quality is examined in terms of a suitable minimum data set and the need for a 
survey design for representative sampling using results from a survey of 24,000 km? in northeastern New South Wales. 
Examples of analytical tools for modeling species distribution, e.g., generalized linear models (GLM) and generalized 
additive models (GAM), are presented using data from a database of 9537 plots and 273 tree species for an area of 
40,000 km? in southeastern New South Wales. The necessity for ecological theory, in particular continnnm theory as 
opposed to community concepts, is examined in the context of these results. The interface between ecological and 
evolutionary theory is discussed drawing on the results of statistical modeling (GLM) of species richness patterns of 
Eucalyptus subgenera in the same area. The predictive use of GIS in mapping vegetation, using statistical modeling 
(GAM) and multivariate classification techniques, is demonstrated with an application to a comprehensive regional 
assessment (CRA) process for establishing a regional conservation plan. These methods and analytical tools have been 
collated into a package, BioRap, which also includes methods for the selection of priority areas for conservation. Rapid 
progress is being made in developing new tools, However, theory for ecological, statistical, environmental. and evolu- 
lionary processes is urgently needed to ensure effective use of these emerging tools for investigating and managing 
biodiversity. 


A key issue facing society is how to conserve 
our global biodiversity. There is need to use the 
currently available information now in order to 
fill the gaps in our conservation strategies. Areas 
with complementary suites of species and/or rep- 
resentative types of ecosystems are required. 
There is also a need to constantly examine how 
to make better use of available data and to find 
better methods to convert data into useful infor- 
mation for policy decisions on conservation. Scott 
& Jennings (1998, this issue) presents a detailed 
account of one of the most comprehensive ap- 
proaches so far. 

Australia is a large continent with a small pop- 
ulation, and government agencies and research in- 
stitutions are devoting considerable resources to the 
development of new approaches for conserving and 
managing Australia’s biodiversity. To do this, Com- 
monwealth and State governments have developed 
major databases and Geographic Information Sys- 


tems (GIS) to provide biodiversity information on 
the location, abundance, and dynamics of Austra- 
lia’s native flora and fauna, e.g., the Environmental 
Resource Information Network (ERIN, Chapman & 
Busby, 1994). Key issues arising from the use of 
these tools are how best to answer policy questions, 
data quality, the suitability of analytic tools, the 
role of ecological theory, the predictive success of 
GIS, and how best to make methods available to 
the wider community of users. 

This paper focuses on Australian research in this 
area, in particular on improving information pro- 
vision methods using modern computer technology. 
The topics considered are: use of available data, 
such as herbarium records and vegetation survey 
data; design of surveys to obtain more cost-effective 
data; use of statistical modeling and GIS to predict 
species distributions and richness patterns from 
survey data; and the need to evaluate methodology 
against existing theory. 
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DATA 


Most developed countries are establishing data- 
bases of biotic data for creating and evaluating con- 
servation policies (Chapman & Busby, 1994; Scott 
& Jennings, 1998; Soberén et al., 1996). Similar 
methods are also being adopted for developing 
countries (Hall, 1994). The data are usually based 
on herbarium or museum records. In Australia, the 
federal government has established the Environ- 
mental Resources Information Network (ERIN) to 
collate, organize, and provide access to the avail- 
able data. Maximizing the use of existing data is 
now critical as resources to re-collect data by 
means of surveys are very limited. As part of this 
effort the principal herbaria and botanic gardens in 
Australia have cooperated to produce a common 
standard for computer-based records systems for 
specimens. There is a working group that meets 
regularly to address ongoing applications issues. It 
is estimated to cost $6 Australian to database a 
single herbarium or museum record, but several 
times that to collect specimens using professional 
staff (Chapman & Busby, 1994). ERIN has devel- 
oped an extensive hardware and software system to 
support the aim of providing primary data to iden- 
tify and characterize regional environmental pat- 
terns for use in environmental assessment and 
planning. For handling taxonomic data, ERIN has 
developed modules for managing taxon names and 
easily updating them (Taxon), managing individual 
records of specimens (Specimen), and a Data Dic- 
tionary and catalogue module for managing data 
sets including custodianship. These modules and 
others are linked to a GIS to form what ERIN terms 
a Spatial Information System (SIS). Chapman and 
Busby (1994) provided further details of the sys- 
tem, and there is a website (http://www.erin.gov.au) 
that also provides a public access system for plant 
records. The system provides for all types of data 
and remote-sensing coverage of Australia, but the 
primary taxonomic record data are a key compo- 
nent. 

It is important, however, to recognize that her- 
barium records suffer from several weaknesses 
(Hall, 1994; Margules & Austin, 1994; Soberón et 
al., 1996): the records record presence only, and 
there is no information about absence; the locations 
are often poorly recorded; the presence of other 
species and of environmental variables is inconsis- 
tently recorded; and the spatial distribution of spec- 
imens is highly biased. Figure 1 exemplifies the 
location bias of museum records; it shows the dis- 
tribution of all suitable records of elapid snakes in 
Australia (Longmore, 1986). The major roads in re- 
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Figure 1. Collection sites for all species in the Atlas 
of Elapid Snakes of Australia (Longmore, 1986). Note the 
alignment of sites along major roads, especially in the 
Northern Territory. Reprinted with permission. 


mote areas of the continent are clearly outlined by 
the record locations. With hindsight it is easy to be 
critical, and such records were not intended to pro- 
vide definitive data for regional biogeographic or 
conservation studies. However, when used for anal- 
ysis of areas of high biodiversity or endemism, 
problems can arise; see Tuomisto (1998, this issue) 
for an example from Amazonia. The minimum data 
set needed for analysis is presence/absence data 
and an accurate location for which environmental 
data can be obtained via a map or GIS. Statistical 
analysis is precluded by the lack of absence data. 
How to use presence-only data is a serious problem 
not always recognized by systematists (Soberén et 
al., 1996; Margules & Redhead, 1995) when con- 
sidering conservation issues. However, herbarium 
collections provide taxonomic precision and verifi- 
able voucher specimens, which vegetation surveys 
usually lack. 

This has led to the development in Australia of 
two heuristic methods to make maximum use of 
presence data. The first, BIOCLIM (Nix, 1986; Bus- 
by, 1986, 1991; now termed BIOMAP (Hutchinson 
et al., 1997)), uses geocoded specimen records to- 
gether with estimates of a selected set of biocli- 
matic variables for the location. The estimates are 
derived from climatic surfaces calculated using rec- 
ords from climatic stations. These specimen records 
are used to estimate the range of each bioclimatic 
variable within which the species is found. For 
each location where a specimen of a species is re- 
corded, the climatic estimates are aggregated to 
provide a “climate profile” of the taxon. The values 
for each estimate are ranked in increasing order 
such that the minimum value, the 5th percentile, 
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and 95th percentiles, etc., can be defined. This has 
been done for 12 bioclimatic variables to define the 
climatic profile (Busby, 1986). By describing the 
climatic profile for a species as the combination of 
climatic conditions lying between the 5th and 95th 
percentiles for 16 climatic variables, a climatic en- 
velope for the potential occurrence of a species is 
defined. From this profile, together with a grid of 
predictions of the bioclimatic variables for a region 
or continent, a map of the potential occurrence of 
a species can be generated based on climatic in- 
formation (Longmore, 1986; Hutchinson et al., 
1997). The prediction map is only of potential oc- 
currence because no information on absence is 
used, and there is no information on other environ- 
mental or historical factors that might control spe- 
cles occurrence. 

There are four essential components to the pro- 
cedure: (1) a method to produce climatic estimates 
from the records of climate stations and measure- 
ments of latitude, longitude, and elevation (see 
Wahba & Wendelburger, 1980; Hutchinson & Bis- 
chof, 1983; Hutchinson, 1984); (2) existence of a 
digital elevation model which can be used to gen- 
erate the climatic predictions for all points in the 
region; (3) a conceptual model for deciding on an 
appropriate set of bioclimatic variables relevant to 
the organisms being studied (Nix, 1986); and (4) a 
classification algorithm to define the bioclimatic en- 
velope. Hutchinson et al. (1997) provided an up- 
to-date presentation of all stages of the approach. 
Examples of the application of this method are 
Longmore (1986), for a continental study of elapid 
snakes; Nix and Switzer (1991), on the potential 
regional distribution of Australia’s rainforest verte- 
brate fauna; Busby (1986) on distribution of Noth- 
ofagus cunninghamiana (Fagaceae); and Busby 
(1988) on the impact of climate change on Austra- 
lia’s flora and fauna. 

A revised method, HABITAT, has been pub- 
lished (Walker & Cocks, 1991) that uses a polyg- 
onal rather than the cruder multidimensional rect- 
angular definition of the climatic envelope used by 
BIOCLIM. This procedure provides a more conser- 
vative (smaller) envelope that takes more account 
of the actual distribution of presence records in the 
climate space. It has been applied to estimating the 
continental distribution of kangaroos (Walker & 
Cocks, 1991). However, BIOCLIM (now BIOMAP) 
remains the most extensively used of heuristic 
methods for presence data. See Austin et al. 
(1994) for a further review of presence methods. 

To provide better data, herbarium records should 
contain precise locations and consistent environ- 
mental information. A preferable minimum data set 


is presence/absence data for all species in a stan- 
dard set of taxa from plots collected as part of a 
vegetation survey. In any survey, absence is con- 
ditional on the sampling effort made at a site. Large 
databases that are capable of supporting statistical 
modeling can be built up by collating such data 
from existing surveys (Austin et al., 1990; Leath- 
wick & Mitchell, 1992). The principal weakness in 
such data sets is the unknown sampling bias in the 
original selection of the plots. To make most use of 
databases they should be ecological in nature rath- 
er than taxonomic. Margules and Austin (1994) 
have discussed the requirements for establishing 
such a database, listing four requirements: (a) a 
conceptual framework based on ecological theory; 
(b) field data obtained from sites using survey de- 
sign principles based explicitly on the conceptual 
framework; (c) a rationale for determining which 
measurements should be made at the chosen sites 
in addition to the floristic records; and (d) appro- 
priate statistical methods for analyzing survey data 
and predicting (extrapolating) the regional distri- 
bution of species from the point records. These au- 
thors failed to emphasize that this is only possible 
if the database is linked to a GIS. 


SURVEY DESIGN 


How to obtain a representative sample of the veg- 
etation variation in a region is a central question for 
conservation evaluation. Vegetation surveys of large 
areas are expensive and time-consuming, particular- 
ly if random or systematic sampling is undertaken 
in rugged or inaccessible regions (Burbidge, 1991; 
see also Tuomisto, 1998). Cost-effective methods are 
required. In Australia, modifications by Austin and 
Heyligers (1989, 1991) of the gradsect sampling ap- 
proach first proposed by Gillison and Brewer (1985) 
provide an example of an explicit, consistent, and 
repeatable method. Unlike many sampling strategies 
that produce unbiased estimates of some mean val- 
ue, e.g., basal area of timber per unit area in the 
region, vegetation surveys should be directed toward 
obtaining a representative sample of the range of 
variation in vegetation composition. The detection 
of unusual combinations of species is as important 
as accurate estimates of the average composition of 
the commonest forest types. The method proposed 
by Austin and Heyligers (1989) is based on sampling 
vegetation from all possible combinations of selected 
environmental variables. The logistics of surveys, 
e.g., travel time between sampling sites, add consid- 
erably to the costs. Sampling along a transect is very 
cost-effective in travel time. If the transect is ori- 
ented along the steepest environmental gradient in 
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Figure 2. The position of four gradsects selected for 
a region of the north coast of New South Wales, showing 
the extent to which they sample a particular altitude/rain- 
fall class (mean annual rainfall 1000-1399 mm and alti- 
tude 180-540 m). Individual squares represent 1 km’. 
Redrawn from Austin and Heyligers (1991). 


a region (i.e., a gradsect), then different environ- 
ments can be sampled with less effort. Where such 
a gradsect is positioned along an access route, then 
a very cost-effective although biased survey is ob- 
tained. 

Austin and Heyligers (1989, 1991) designed a 
survey of the forest vegetation of coastal northern 
New South Wales (NSW) based on the principles 
outlined above. The area surveyed was 24,000 km?, 
and the floristic data consisted of presence/absence 
data of tree species recorded from a 50 X 20 m plot 
oriented along the contour with estimates of the 
ranking of the dominant species. The protocol used 
consisted of seven steps: (1) Identify the major en- 
vironmental variables influencing the distribution 
patterns of the vegetation in the study region. For 
their region these were temperature, rainfall, radia- 
tion, and nutrients. (2) Recognize a set of variables 
best suited because of their availability and practi- 
cality to determine the position and direction of the 
gradsects. For the north coast region these were al- 
titude (an easily measured and highly correlated sur- 
rogate for temperature), mean annual rainfall, and 
lithology (crude surrogate for soil nutrient content). 
(3) Select gradsects using these variables and the 
best available technology. Figure 2 shows the extent 
to which the four selected gradsects sample one par- 
ticular combination of altitude and rainfall. (4) Strat- 
ify the gradsects into geographical segments and 
stratify the environment within segments to provide 
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Table 1. Example of survey design for a segment, Size 
of the environmental cells and their sampling frequency 
for the middle segment of the southern segment. Repro- 


duced from Austin & Heyligers (1989). 


Rock- Altitude Rainfall classes 


type classes 2 3 4 5 6 
1 
2 4 (x) 
3 3 3 (1) 45 (1) 
4 2 (x) 
5 
1 
2 
6 3 1 (0) 
4 1 (0) 
5 
l 2 (1) 15(1) 26 (1) 
2 16 (1) 37 (1) 38(1) 1 (0) 
8 3 1 (0) 34 (1) 22(1) 12 (1) 
4 1 (0) 
5 
1 11 (1) 30 (1) 2) 
2 55 (2) 17 (1) 28 (1) 1(0) 
9 3 86 (2) 75 (2) 4(%) 2(1) 
4 1 (0) 26 (1) 
5 1 (0) 


The plain numbers refer to the total number of pixels in 
an environmental cell. The numbers in parentheses refer 
to the number of samples (a sample may consist of up to 
5 plots each from different topographic positions) to be 
sampled in each cell. An “x” indicates that these cells 
were not easily accessible and no sample was to be taken. 


replicate sampling of different environmental com- 
binations at different locations (Fig. 2). (5) Stratify 
at the local scale, i.e., within the 1-km resolution 
used in positioning the gradsects to take account of 
other important environmental determinants of veg- 
etation. In this case five topographic positions as a 
surrogate for solar radiation were sampled within 
each 1-km gridcell selected. (6) Decide the effort to 
be spent sampling the rarest environmental combi- 
nations as compared with increased replication of 
the commonest combinations. Determine the location 
of samples by selecting random coordinates and tak- 
ing the closest suitable cell with adequate access. 
Table 1 shows an example of the survey design for 
a particular segment. (7) Review assumptions re- 
garding importance of environmental variables on 
which the survey was designed and modify if nec- 
essary before completing the survey. Austin and 
Heyligers (1989) modified the survey design after 
finding that depth to water table had an overriding 
influence in the coastal lowlands. 


Annals of the 
Missouri Botanical Garden 





The approach of Austin and Heyligers (1989) can 
be summarized as an SR? strategy, that is, Stratifi- 
cation, Representation, Replication, and Random- 
ization. It represents one realization of the first two 
requirements of Margules and Austin (1994) for es- 
tablishing a database. A total of 1025 plots were 
sampled by Austin and Heyligers (1989) equal to an 
area of 1.025 km’, or approximately 1/24,000th of 
the study area. The restriction of sampling to grad- 
sects means that not all locations have an equal 
chance of being selected in the sample, and there- 
fore the sample obtained is highly biased. However, 
the sample is representative, and the design is ex- 
plicit, consistent, and repeatable, which is not al- 
ways the case with biodiversity surveys at the pres- 
ent time, While it is possible to design an SR? survey 
without a GIS, it is much easier if one is available. 
Modifications of it have since been used in northern 
NSW, Australia (Ferrier, pers. comm.), in Sri Lanka 
(Green & Gunawardena, 1993), and in South Africa 
(Wessels et al., in press). The approach makes two 
ecological assumptions. First, that vegetation varies 
continuously with environment, forming a continuum 
rather than discrete communities (Austin & Smith, 
1989), and therefore all combinations of environ- 
mental conditions should be sampled. Second, it as- 
sumes that the major environmental gradients in any 
given region are known or can reasonably be hy- 
pothesized. 

The comparative performance of gradsect sam- 
pling has been evaluated by Austin and Cawsey 
(1991) with artificial data and by Wessels et al. (in 
press) with survey data for birds and dung beetles 
in South Africa. Both studies support the cost-ef- 
fectiveness of gradsect sampling against systematic, 
random, and various purposive methods. Detailed 
attention has been given to this example of survey 
design because vegetation survey methods appears 
to be a neglected topic (Greig-Smith, 1983; Jong- 
man et al., 1987; Kent & Coker, 1992), though see 
Noy-Meir (1971) for an early Australian example. 
A variety of alternative designs have been devel- 
oped in Australia; see Noy-Meir (1971), Austin and 
Basinski (1979), Margules and Nicholls (1987), 
Prober and Austin (1990), McKenzie et al. (1991), 
and Neave et al. (1996). 


STATISTICAL MODELING 


Availability of survey data consisting of pres- 
ence/absence for species plus information on en- 
vironmental variables from a GIS allows the pre- 
diction of species distributions using statistical 
modeling. Statistical modeling is no longer restrict- 
ed to quantitative data with normal errors (Mc- 


Cullagh & Nelder, 1989), as many botanists as- 
sume. There is currently a wide variety of 
prediction methods extending well beyond the usu- 
al statistical methods to neural nets (Aleksander & 
Morton, 1990; Fitzgerald & Lees, 1992), genetic 
algorithms (Holland, 1992; Lees, 1994), and deci- 
sion trees (Breiman et al., 1984; Lees & Ritman, 
1991). A recent evaluation of many of these meth- 
ods (Austin et al., 1994a, 1995; Austin & Meyers, 
1996) for analyzing plant ecological data concluded 
that while most techniques can be found to have 
advantages under certain circumstances, statistical 
models perform better with typical vegetation sur- 
vey data. Franklin (1995) provided review of recent 
work from a geographer’s perspective. The two sta- 
tistical modeling methods that are currently being 
actively used are Generalized Linear Models (GLM; 
McCullagh & Nelder, 1989) and Generalized Ad- 
ditive Models (GAM; Hastie & Tibshirani, 1990). 
Examples of the use of GLM with vegetation data 
are Austin et al. (1990, 1994b) and Leathwick and 
Mitchell (1992). Nicholls (1989, 1991) provided a 
detailed discussion with examples of how to use 
GLM with vegetation survey data. The more recent 
technique of GAM was introduced to plant ecology 
by Yee and Mitchell (1991), Leathwick (1995) used 
it to study the climatic relationships of New Zea- 
land tree species. Austin and Meyers (1996) com- 
pared GLM and GAM for Eucalyptus forest species 
and discussed their role in the management of for- 
est biodiversity. Recently GAMs have been used for 
predicting flora and fauna distributions for a large 
area of northwestern NSW (NSW NPWS, 1994a, b) 
and to derive predicted vegetation communities for 
the south coast of NSW in an unpublished CSIRO 
consultancy report in 1996. 

Statistical models such as GLM are used for the 
prediction of a response variable (or dependent 
variable) from a set of predictor (or independent) 
variables. One advantage of GLM over the classical 
regression method is that it allows error functions 
other than the normal, and hence the use of density 
or even binary data is possible. GAM, a non-para- 
metric technique, has the additional advantage that 
the mathematical function describing the shape of 
the curve relating the response variable to a pre- 
dictor variable need not be specified precisely. A 
smoothing spline is fitted to the data, and only the 
number of inflections in the curve need he speci- 
fied, not whether it is a polynomial or exponential 
function. 

The key problem in the model-building process 
for GLM use with vegetation data has heen the shape 
of the response of a plant species to environmental 
predictors. Ecological theory is needed to define a 


Volume 85, Number 1 
1998 


reasonable set of potential responses. The evidence 
regarding the existence of the bell-shaped response 
usually presented in textbooks is ambiguous (Austin 
& Smith, 1989), and more flexible curves need to 
be considered. The B-function is one complex func- 
tion that has been proposed (Austin et al., 1994b). 
It requires definition of the limits of a species dis- 
tribution along an environmental gradient within 
which a variety of skewed or symmetric curves can 
be represented by B-functions with different param- 
eter values. Austin et al. (1994b) fitted a B-function 
for temperature to data for nine species of Eucalyp- 
tus (Myrtaceae). No species had a symmetric re- 
sponse shape; all were skewed and the patterns of 
skewness were dependent on position along the en- 
vironmental gradient of mean annual temperature 
(Austin et al., 1994b). The results were confirmed 
for a larger set of Eucalyptus species (Austin & Gay- 
wood, 1994). Their conclusions suggest that species 
distributions along gradients have well-specified 
skewed shapes and nonrandom patterns. If these 
patterns are found in other suitable data sets, then 
it may be possible to propose rules regarding the 
biodiversity patterns to be found in vegetation. Data 
sets are needed where the length of the environmen- 
tal gradient sampled clearly exceeds the width of the 
environmental niche of the individual species, oth- 
erwise the species limits cannot be specified. Failure 
to appreciate this limitation to the use of B-functions 
has resulted in controversy (Oksanen, 1997; Austin 
& Nicholls, 1997; see also Austin & Meyers, 1996). 

It is the difficulty of specifying the exact form of 
the response shape that has led several researchers 
to use GAM (Yee & Mitchell, 1991; Norton & 
Mitchell, 1993; NSW NPWS, 1994a, 1994b; Leath- 
wick, 1995). GAM, while conferring the advantage 
of a non-parametric smoothing function, is not with- 
out problems, e.g., the sensitivity of significance 
tests (Austin et al., 1995; Austin & Meyers, 1996), 
and is certainly not without assumptions as asserted 
by Norton and Mitchell (1993). It is a “current best 
practice method” for biodiversity analysis but is 
likely to undergo significant modifications in the 
near future as further evaluation is done. 


EUCALYPTUS FASTIGATA: A CASE STUDY In GAM 
PREDICTION OF SPECIES DISTRIBUTION 


The steps in modeling the distribution of a spe- 
cies based on available plot data from a region in 
southeastem NSW, Australia, are presented here. 
The details of the study area have been published 
previously (Austin et al., 1990; Austin et al., 
1994b). Briefly, it is approximately 40,000 km? in 
area and runs from just north of latitude 35°S to 
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the Victorian border, and from longitude 148°E to 
the east coast of Australia. The climate varies con- 
siderably across the region: mean annual temper- 
ature ranges from 2.5°C on Mt. Kosciusko at 2200 
m to 16.9°C on the northern coastal plain, while 
mean annual rainfall varies from 480 mm to more 
than 2000 mm, with marked seasonal differences 
in rainfall patterns. Eucalyptus fastigata H. Deane 
& Maiden is a species characteristic of the coastal 
scarp forests between 400 m and 800 m (Fig. 3), 
and results of statistical models of its distribution 
using GLM have been published (Austin, 1992; 
Austin et al., 1994b). 


MODELING STEPS 


1. Collate available plot data for the defined re- 
gion. Details of the database and the contribu- 
tors can be found in Austin et al. (1994b). The 
current database has 9537 plots with records of 
the presence/absence for 273 tree species, and 
the geographical distribution of the plots is 
shown in Figure 3. 

2. Select and generate a set of appropriate envi- 
ronmental predictors. Climatic variables have 
been generated from a digital elevation model 
(DEM) using a variety of packages now incor- 
porated into BioRap (Hutchinson et al., 1997), 
and values for the plots obtained using a GIS. 
Those variables based on lithology were derived 
from a lithology GIS layer. Eleven environmen- 
tal variables were selected as potential predic- 
tors. These included eight continuous variables: 
average summer rainfall, average winter rainfall 
and rainfall seasonality (ratio of summer/winter 
rainfall), mean annual temperature, temperature 
of the hottest month and winter cold index, av- 
erage summer radiation and average winter ra- 
diation (kj/m?/day adjusted for slope and as- 
pect); and three factor or categorical variables: 
topographic position (6), lithology class (6), and 
nutrient index (5) (figure in parentheses is the 
number of classes in the factor). Figure 4 shows 
the plots mapped into a climate space equivalent 
to the geographical space shown in Figure 3. 

3. Restrict the data, where the data extend well 
beyond the environmental niche of a species. 
For example, E. fastigata does not occur below 
7°C or above 16°C mean annual temperature 
(Fig. 4). Inclusion of those zero values beyond 
these limits can complicate the analysis and 
lead to poor prediction of the occurrence of the 
species near the limits (Austin & Meyers, 1996). 
The data are therefore restricted to those obser- 
vations that occur within limits set by having 
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Figure 3. The geographical distribution of Eucalyptus fastigata as determined from the database of 9537 plots for 
southeast NSW. Triangles indicate E. fastigata present; dots indicate plots without £. fastigata. 
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Figure 4. The distribution of Eucalyptus fastigata in a climate space defined by mean annual temperature and 
mean monthly winter rainfall. Note E. fastigata is absent below 7° and above 16°C mean annual temperature, with 
numerous plots above and below those limits. 


Volume 85, Number 1 
1998 





50 100 150 200 250 300 
Summer mean monthly rainfall (mm) 





5 10 15 20 

Mean annual temperature (C) 
1.0 
2 0.8 
F 0.6 
o 0.4 
a 0.2 
0.0 

1 2 3 4 5 
Topography 


Austin 9 
Australian Eucalypt Forests 


1.0 

208 

5 0.6 

6 0.4 

a 0.2 

0.0 
0.5 1.5 


Seasonality 


2.5 


1.0 
g 0.8 


2 0.6 
O 0.4 
à 0.2 


0.0 
2 4 6 8 10 12 14 


Winter mean daily radiation (kj/m2/day) 


1 2 3 4 5 
Nutrient index 


Figure 5. The shape of the GAM response functions for 6 of the 11 predictors for Eucalyptus fastigata. Note that 
the functions have been fitted only within limits for mean annual temperature and summer mean monthly rainfall. 


100 zero values above and below the last posi- 
tive observation, provided there are additional 
observations beyond the limits; see Austin and 
Meyers (1996) for further details. 

4. Fit a GAM. The model was derived for presence/ 
absence data for E. fastigata, as predicted from 
the 11 environmental variables using S-Plus 
package (Statistical Sciences, 1993), with four 
degrees of freedom for the continuous predic- 
tors. All eleven predictors were included in the 
model. The shapes of the responses differ mark- 
edly for the different predictors (Fig. 5). 

5. Use GIS to predict the distribution of the spe- 
cies for unsampled areas in the region. This was 
done using the predictive functions derived from 
GAMs. The predicted distribution of E. fastigata 
clearly shows the major zone of occurrence 
along the coastal scarp (Fig. 6; cf. Fig. 3). 


These models can be used to investigate current 
ecological problems of relevance to our future man- 
agement of biodiversity. For example, where would 
Eucalyptus fastigata occur if global warming re- 
sulted in a 2°C rise in regional temperature and 


local increases in rainfall? The predicted geograph- 
ical distribution after such a change is shown in 
Figure 7. Eucalyptus fastigata would undergo a 
substantial reduction in occurrence on the coastal 
scarp under such a scenario. Note that this is a 
static analysis ignoring problems of dispersal, time 
to equilibrium, and changed competitive interac- 
tions. The role of environmental niche models in 
relation to climate change models and physiological 
growth models was reviewed by Austin (1992), with 
particular reference to E. fastigata. 

The above procedure is explicit, repeatable, and 
consistent. There are both statistical and ecological 
research issues still to be resolved about the best 
procedure. Austin and Meyers (1996; Austin et al., 
1995) have examined the performance of GLM, 
GAM, and regression trees on both real data and 
simulated data where truth is known. They con- 
clude that a mixed strategy using both GLM and 
GAM functions is desirable. They suggest that the 
best results are as dependent on the availability of 
suitable ecological and statistical skills as on the 
particular procedure used. The explicit nature of 
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Figure 6. The predicted geographical distribution of Eucacyptus fastigata in terms of probability of occurrence using 
the GAM functions and a GIS for the coastal zone (outlined area) of New South Wales. 


the models for individual species provides a firm 
basis on which to build an improved understanding 
of species distribution patterns. The ad hoc map- 
ping of vegetation and species based on unknown 
mental models derived from an unknown arbitrary 
database imperfectly remembered is no longer ad- 
equate. However, it must also be remembered that 
these models are only as good as the data, and the 


ecological assumptions on which the predictors are 
selected, and are based on correlation, not causa- 
tion. 


SPECIES RICHNESS 


Statistical models like GLM can he used in other 
contexts more relevant to evolutionary botany and 
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The predicted distribution of Eucalyptus fastigata if mean annual temperature increased by two degrees 


and rainfall by 20% in coastal and tableland regions and 10% in the western region for the same region as Figure 6. 


its interface with ecology (Currie, 1991). Austin et 
al. (1996) investigated patterns of tree species rich- 
ness in southern NSW using a similar but smaller 
data set (7208 plots) than that used for Eucalyptus 
fastigata above. Similar predictors to those of Aus- 
tin et al. (1994b) were used, namely mean annual 
temperature, mean annual rainfall, mean annual 
daily radiation, and four categorical variables (to- 
pographic position, lithology, nutrient index, and 
rainfall seasonality). Total tree-species richness for 


0.1-ha plots was predicted as the dependent or re- 
sponse variable using GLM, with cubic polynomial 
functions for the continuous variables and interac- 
tion terms for temperature and rainfall. Regional 
scale patterns of species richness are predictable 
from the environment, with mean annual tempera- 
ture the most important predictor. Maximum spe- 
cies richness for trees was found in protected gul- 
lies at temperatures >16°C, with rainfall >900 
mm, on volcanic soils with intermediate or high nu- 
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Figure 8. The predicted distribution of species rich- 
ness for Eucalyptus subg. Monocalyptus in relation to cli- 
matic predictors on exposed ridges with high radiation and 
low nutrients, with soft sedimentary lithology. 


trent levels (Austin et al., 1996). This habitat rep- 
resents the limited conditions under which the spe- 
cies-rich warm temperate rainforest species can 
survive in the fire-prone eucalypt-dominated forests 
of the region. Various components of tree-species 
richness can be recognized. For example, there are 
numerous species of Eucalyptus in the region, and 
analyses of the species richness patterns were made 
for two of the subgenera, Monocalyptus and Sym- 
phyomyrtus. All predictors except seasonality of 
rainfall were significant for the subgenus Monoca- 
lyptus, and there was a complex skewed response 
to temperature and rainfall (Fig. 8). Maximum spe- 
cies richness for Monocalyptus was predicted for 
exposed ridges on sediments or granites under low 
nutrient conditions in temperate climatic condi- 
tions. The other subgenus, Symphyomyrtus, showed 
a distinct complementary pattern and insensitivity 
to radiation and topographic position, but species 
richness also varied with seasonality of rainfall. 
High species richness was associated with fertile 
soils (Austin et al., 1996). There has been consid- 
erable discussion about the differential behavior of 
species from these two subgenera and their ability 
to co-occur (Noble, 1989). The descriptive models 
obtained by GLM analysis are consistent with the 
conclusions reached in the literature review of No- 
ble (1989). 

For the evolutionary botanist these results pose 
the question of why there are more species of sub- 
genus Monocalyptus co-occurring in some environ- 
ments than others. Figure 8 shows that there is an 


optimum environment for numbers of species of 
Monocalyptus; any theory of biodiversity and evo- 
lution should be able to explain the existence of 
such a pattern in environmental space. Managers 
of biodiversity also need to understand the rela- 
tionship between species richness and environ- 
ment. If individual species and species richness are 
both strongly related to environment, then the con- 
cept of a regional species pool needs to be re-ex- 
amined. 


APPLICATIONS 


An example of the use to which these computer- 
based tools are being put in Australia is an unpub- 
lished consultancy report by the CSIRO Division of 
Wildlife and Ecology for the NSW National Parks 
and Wildlife Service. The objective of the consul- 
tancy was to map the pre-European forest vegeta- 
tion (pre-1750) at the scale of 1: 100,000, such that 
the percentage of the pre-European communities 
still surviving could be estimated. This information 
would then be used to determine which currently 
forested areas should be conserved, which logged, 
and which require further detailed examination. 
The region concerned was the Southern Coastal 
Zone of NSW. There was only a limited time avail- 
able to complete the study. However, the existing 
database and modeling studies described above, 
plus an appropriate GIS, provided a suitable basis 
for undertaking the study using modern methods. 

The ecological theory on which the study was 
based assumed that the vegetation formed a contin- 
uum such that the precise composition of the veg- 
etation varied continuously, and communities were 
a function of the frequency of particular environ- 
mental combinations in the landscape (Austin & 
Smith, 1989). Estimating individual species distri- 
butions using GAMs from existing data for relevant 
environments would allow spatial predictions of 
distributions for cleared areas. Combining the pre- 
dictions for individual species for each cell of a GIS 
gives an expected but continuously varying com- 
munity composition. This composition data can 
then be classified using numerical methods to give 
a consistent description of forest communities for 
the entire region. 

The steps involved practical decisions at each 
stage. These are dependent on the particular fea- 
tures of each project. These are briefly described 
here to indicate the types of problems that arise. 


1. Data. The existing database consisted of 8377 
plots. No data were recorded in the database for 
the northeastern area of the zone. Additional 
data were collated from that and other regions 
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Figure 9. Pre-1750 mapping units. —a (left). Savannah Woodlands (Eucalyptus melliodora/E. bridgesiana group). 
—b (right). Lowland Granite Communities (E. tereticornis group). 


in the zone. However, this was found to give a 
very biased sample from the northeastern re- 
gion; most plots were from warm temperate rain- 
forest in gullies. An additional survey of the 
northeastern region was necessary. 

2. Survey. The design was based on the SR? strat- 
egy, described previously. The gradsect ap- 
proach was not used, as access was not a major 
limiting factor and distances were small. Each 
of the three 1: 100,000 maps containing parts of 
the northeastern area were used as geographical 
strata within which environmental combinations 
based on mean annual temperature, mean an- 
nual rainfall, and lithology were mapped with a 
GIS. Sites were selected for second stage sam- 
pling by topographic position, as previously de- 
scribed (Austin & Heyligers, 1989). 

3. Modeling. The data finally consisted of 9537 
plots. After setting an acceptance criterion of at 
least 50 presence observations in order to in- 
clude a species in the GAM modeling, 88 tree 
species were modeled. To reduce the time taken 
to model the species, the same model was fitted 
to all species. The eleven predictors used for E. 
fastigata (see above) were used. Use of such a 
generic model ignoring significance levels will 
result in overspecification. The degree to which 
this reduces the accuracy of the models is the 
subject of current research. 


4, GIS. A GIS with a 1-ha resolution was available 
containing all the necessary predictor variables, 
so predictions for each of the 88 species was 
possible for each of the 2.7 million cells in the 
GIS in the 27,000 km? zone. 

5. Classification. To provide a community classifi- 
cation of the zone, the 2.7 million pixels, each 
characterized by the probability of occurrence 
of 88 tree species, were used in a numerical 
classification using ALOC and UPGMA proce- 
dures in the package PATN (Belbin, 1995). The 
available computer facilities and time imposed 
major limitations on the analysis of the large 
species by pixel matrix. 


The final stage was a manual reorganization of 
the classification dendrogram to provide mappable 
units. Vegetation composition is strongly controlled 
by aspect in the area, and classification units were 
grouped into catenary sequences to give spatially 
coherent units for mapping. Two levels of vegetation 
classification were recognized: classes roughly cor- 
responding to formations or alliances, and units ap- 
proximating communities. Figure 9 gives examples 
of the class maps obtained. These maps at the finer 
scale of units, when combined with a land-cover 
map showing remaining forest areas, were used to 
decide that a further 100,000 ha of forest needed 


to be reserved in order to conserve an adequate 
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representation of the pre-1750 forest communities. 
Pressey (in press) discusses the relationship be- 
tween the scientific analysis and the political pro- 
cess for a similar exercise for northern NSW. 

When information of this kind is available, then 
a further stage is reached where methods are need- 
ed to determine biodiversity priority areas. Explicit 
criteria are required, but the particular techniques 
are dependent on the available data. In general 
terms there are two classes of methods: (1) those 
that identify a set of areas in which all selected 
biodiversity attributes (e.g., species) are represent- 
ed a specified number of times, e.g., once, twice, 
or three times; (2) those that maximize the amount 
of biodiversity represented by a given number of 
areas (Margules & Redhead, 1995), In the first the 
level of representation is specified arbitrarily, while 
in the second it is the number of areas that is fixed. 
This has become a major area of research and in- 
novation where numerous constraints and trade-offs 
have been incorporated into the computer algo- 
rithms. In Australia, Margules and Nicholls (1987) 
pioneered an effective computer algorithm for the 
attribute-representation approach. Subsequently 
these authors with Pressey explored a number of 
the options with this approach (Nicholls & Mar- 
gules, 1993; Pressey & Nicholls, 1989a, b; Pressey, 
1994). Faith (1994) has developed a number of ap- 
proaches to the second class of methods, using 
measures of dissimilarity and ordination techniques 
(Faith & Walker, 1994, 1997; Faith & Nicholls, 
1997). Two features of the work by these authors 
are worthy of comment. First, the recognition that 
species richness per se is not a good criterion for 
conserving representative biodiversity; it is easily 
shown with simple examples that selecting the 
richest site of three may result in conserving fewer 
species than selecting the two sites each with fewer 
species. Complementarity of site composition is 
more important than maximal richness of individual 
sites. Second, the recognition that sophisticated al- 
gorithms are only valuable if they can be used with 
the limited and arbitrary data sets currently avail- 
able and enhance those data rather than hide their 
inadequacies. The BioRap manuals (Margules & 
Redhead, 1995; Boston, 1997; Hutchinson et al., 
1997; Faith & Nicholls, 1997; Noble, 1997) pro- 
vide case studies of the use of alternative methods 
with various types of data. 


DISCUSSION 


Herbarium records, while a primary source of 
data, have their limitations for analysis of species 
distributions (Hall, 1994; Austin et al., 1994a; Sob- 


erón et al., 1996). Data quality is a key issue, and 
computer routines for examining records of a spe- 
cies’ distribution are an important first step (Chap- 
man & Busby, 1994). New approaches such as 
BIOCLIM (Nix, 1986; Busby, 1991) and HABITAT 
(Walker & Cocks, 1991) are examples of heuristic 
methods designed to overcome the limitations of 
presence data. One difficulty is that survey data 
will age taxonomically. Without voucher specimens, 
it will not be possible to update survey records to 
take account of taxonomic revisions. However, 
inanaging biological diversity will require better 
data than herbarium presence records provide. Hai- 
la and Margules (1996) argued strongly that a nec- 
essary component of any practicable strategy for 
preserving the biological diversity of the earth is 
systematic field survey. They noted, however, that 
modern theoretical ecologists regard surveys as te- 
dious, mundane activities; yet such data are essen- 
tial to testing theory. Any survey has implicit in its 
design a set of ecological assumptions and a set of 
statistical assumptions; if these are not recognized 
and progressively improved upon, then maximum 
use will not be made of our limited survey re- 
sources. This paper has attempted to present some 
of the Australian experience in this area, but rapid 
changes are occurring as a result of society’s de- 
mands that decisions be made on the inadequate 
database that currently exists. Poor survey design 
and predictive modeling techniques are adding to 
the difficulties. A major reason for this is that much 
of the work is appearing in the “gray” literature, 
and is inaccessible to many conservation scientists 
who might otherwise use the improved methods and 
techniques if they were aware of them. This review 
suffers from this problem in that much of the Aus- 
tralian work, good and bad, has yet to be published 
in the international literature and only exists in in- 
ternal reports or reports published with small num- 
bers of copies. Electronic publication may solve 
this problem of access to the literature. 

Computer technology in various forms, remote- 
sensing, GIS, and statistical software are being 
used to create new tools for the study of biodiver- 
sity. What is more important is that we are finding 
new ways of thinking about the problems of study- 
ing biodiversity, whether it is how to design surveys 
or to develop new theories integrating ecology and 
evolution to better conserve our flora. Each stage 
in the study of biodiversity is now the subject of 
intense investigation in terms of basic research, 
conservation application, and cost-effectiveness 
(Margules & Austin, 1991). In addition, the results 
of such biodiversity studies are being incorporated 
into computer packages designed to facilitate com- 
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munity decision-making in regional land-use plans 
(Cocks et al., 1995) and are being actively used in 
conservation planning (Pressey, in press). A period 
of evaluation is now needed to determine which of 
these methods or tools are the best. 

At the present time our immediate pragmatic 
concern is to make the best possible use of the 
biodiversity data we currently have to make sen- 
sible conservation decisions. Margules and his col- 
leagues, in putting together the BioRap manuals 
and software for rapid assessment of biodiversity 
priority areas for the World Bank (with funding 
from AustAid), have shown how to make use of 
available data. The opportunity to constantly reit- 
erate the processes is one of the strongest argu- 
ments for having computer-based tools for all as- 
pects of biodiversity study: they can be repeated 
when necessary. 
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